Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Subspace models for document script and language identification

Identifieur interne : 000672 ( Main/Exploration ); précédent : 000671; suivant : 000673

Subspace models for document script and language identification

Auteurs : T. N. Vikram [France] ; K. Chidananda Gowda [Inde]

Source :

RBID : ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104

English descriptors

Abstract

In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010

Url:
DOI: 10.1002/ima.20215


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Subspace models for document script and language identification</title>
<author>
<name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</author>
<author>
<name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1002/ima.20215</idno>
<idno type="url">https://api.istex.fr/document/18B9CF840974D4B8413EFE9142CB843C6F9BE104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000973</idno>
<idno type="wicri:Area/Istex/Curation">000962</idno>
<idno type="wicri:Area/Istex/Checkpoint">000252</idno>
<idno type="wicri:doubleKey">0899-9457:2010:Vikram T:subspace:models:for</idno>
<idno type="wicri:Area/Main/Merge">000677</idno>
<idno type="wicri:Area/Main/Curation">000672</idno>
<idno type="wicri:Area/Main/Exploration">000672</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Subspace models for document script and language identification</title>
<author>
<name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
<affiliation wicri:level="3">
<country xml:lang="fr">France</country>
<wicri:regionArea>GREYC, Université de Caen. 6, Boulevard du Maréchal Juin, 14050 CAEN CEDEX</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Basse-Normandie</region>
<settlement type="city">CAEN</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International School of Information Management, University of Mysore, 3004, “Udayaravi” 5th Main, 12th Cross V. V. Puram, Mysore, Karnataka</wicri:regionArea>
<wicri:noRegion>Karnataka</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2010-06">2010-06</date>
<biblScope unit="volume">20</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="140">140</biblScope>
<biblScope unit="page" to="148">148</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<idno type="DOI">10.1002/ima.20215</idno>
<idno type="ArticleID">IMA20215</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>2DFLD</term>
<term>2DPCA</term>
<term>OCR</term>
<term>document image processing</term>
<term>language identification</term>
<term>script identification</term>
<term>subspace models</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="fr">In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Inde</li>
</country>
<region>
<li>Basse-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement>
<li>CAEN</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Région Normandie">
<name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</region>
</country>
<country name="Inde">
<noRegion>
<name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000672 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000672 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104
   |texte=   Subspace models for document script and language identification
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024